This vignette gives a quick overview on pseudotime analysis and how to apply it on a vgm object. In pseudotime analysis single cells are ordered along a trajectory or lineage and each cell receives a specific pseudotime value which represents its position along that trajectory. This can be used for instance in analysing cell differentiation processes or observing changes in gene expression along lineage specification.
We mainly focus on the usage of two pseudotime libraries: Monocle (version 2 and 3) and slingshot. Monocle3 learns from the fact that cells are differentially expressed and sliced along states of differentiation. These transcriptional changes can be learned from scRNA-seq and used to projects trajectories on dimensionality reduction (on e.g. UMAP plot). After that a root for pseudotime can be chosen and it assigns to each cell a pseudotime value based on the trajectories and root node.
The previous version of Monocle3, Monocle2, uses the DDRTree algorithm to learn a tree structure based on the data and projects clustered cells along the branches of the tree. These trajectories can be used to assign a root branch and pseudotime will be assigned along the branches starting from the root branch.
The slingshot library is similar to Monocle3 in a way that it uses dimensionality reduction to project trajectories. It learns relations between clustered data and infers lineages on a global structure. A main difference to monocle is that slingshot can infer multiple lineages based on one root and each lineage assigns pseudotime values to every single cell for that particular lineage.
#Downloading PlatypusDB raw data in a list format
#For structure of PlatypusDB links, please refer to the PlatypusDB vignette
<-PlatypusDB_fetch(PlatypusDB.links =c(
data_for_VGM "shlesinger2022b/TFH.LCMV.acuteLCMV.10dpi.S1/ALL"
#"shlesinger2022b/TFH.LCMV.chronicLCMV.50dpi.S7/ALL"
#"shlesinger2022b/TFH.LCMV.chronicLCMV.30dpi.S6/ALL"
#"shlesinger2022b/TFH.LCMV.chronicLCMV.10dpi.S4/ALL"
),load.to.enviroment = F,
load.to.list = T,
combine.objects = T)
#integrate VDJ and GEX of the two datasets into a vgm object
<- VDJ_GEX_matrix(Data.in = data_for_VGM, verbose = T,
VGM GEX.integrate = T,
VDJ.combine = T,
integrate.GEX.to.VDJ = T,
integrate.VDJ.to.GEX = T,
exclude.GEX.not.in.VDJ = F,
filter.overlapping.barcodes.GEX = F,
filter.overlapping.barcodes.VDJ = F,
get.VDJ.stats = F,
subsample.barcodes = F,
trim.and.align = F,
#integration.method = "harmony",
group.id = names(data_for_VGM))
After the data is loaded and renamed according to their data type (renaming not shown, only for reasons of clarity) let’s have a look at the UMAP:
::DimPlot(VGM[[2]], reduction = 'umap' ,group.by = "group_id") Seurat
To have a better understanding about the distribution of various cell types in the data set we can visualize the expression of cell markers and their cell type annotations.
First we define a set of genes and the function GEX_gene_visualisation()
visualizes the expression of the gene set in our data in three different plots.
<- GEX_gene_visualization(VGM$GEX,
gene_visual_output gene_set = c('CD3E', 'CD4', 'CD8A', 'CD44'),
group.by = "group_id")
1]]
gene_visual_output[[2]]
gene_visual_output[[3]] gene_visual_output[[
As a next step we are interested in the cell types which are present in our data. For that, we use the GEX_projecTILS()
function. ProjecTILs compares our data with a reference map of different cell types. By doing so it lays projections onto the reference map indicating which cell types are mostly present in our data.
<- GEX_projecTILS(ref_path = refpath,
projecTILs_output GEX = VGM$GEX,
split_by = "group_id",
filtering = TRUE,
NA_cells = FALSE)
#update VGM$GEX with the new updated GEX object
2]] <- projecTILs_output[[1]] VGM[[
3]] projecTILs_output[[
## [[1]]
Here we can see that ProjecTILs projects the data onto the reference map. The black circles and triangles represent the cells which are present in the VGM. We can see that a majority of our cells are either Tfh_Effector or Tfh_Memory cells. projecTILs_output[[1]]
is the updated GEX object with additional meta data (“ProjecTILS_assignment”) containing the ProjecTILs assignments which we will be using for the next steps.
As a next step we use Monocle3 to reconstruct trajectories. You can choose the reduction method (Umap, tsne or pca) and the argument how the cells should be grouped color.cells.by
. Here we use Umap for dimensionality reduction and cluster the data based on their cell types inferred from the projecTILs output.
<- GEX_trajectories(GEX = VGM[[2]],
trajectories_output color.cells.by = 'ProjecTILS_assignment',
reduction.method = 'UMAP',
labels.per.group = 2,
group.label.size = 3)
2]]
trajectories_output[[3]] trajectories_output[[
trajectories_output[[3]]
shows the UMAP reduced cell clusters and the trajectories connecting different cell clusters. This plot will be used to plot pseudotime along the trajectories. For that, pseudotime values need to be assigned to every single cell by choosing a single or multiple root nodes. For this you can define root.nodes
. In our case we choose a node which is located among T_cmp cells (e.g. c(‘Y_93’, ‘Y_157’)).
<- GEX_pseudotime_trajectory_plot(
pseudotime_output 1]],
trajectories_output[[root.nodes = c('Y_93', 'Y_157'))
2]]
pseudotime_output[[
#update VGM[[2]] with the inferred pseudotime from pseudotime_output
2]] <- SeuratObject::AddMetaData(VGM[[2]], SummarizedExperiment::colData(pseudotime_output[[1]])$pseudotime,
VGM[[col.name = "pseudotime")
## [1] "calculating pseudotime trajectories for each cell depending on the chosen root nodes"
If you’re interested in a gene or a set of genes and in their expression levels along trajectories you can use the same function from above with the parameter genes
. This will visualize the UMAP and the trajectory with the specified genes and their expression levels.
<- c('CD3E', 'CD4', 'CD8A', 'CD44')
interesting_genes <- GEX_trajectories(VGM[[2]],
trajectories_output color.cells.by = 'ProjecTILS_assignment',
reduction.method = 'UMAP',
genes = interesting_genes,
group.label.size = 2)
2]]
trajectories_output[[3]] trajectories_output[[
Here again, root.nodes
can be specified for the pseudotime plot as above.
The VDJ_GEX_clonotyme()
function also uses the Monocle3 workflow to infer trajectories and pseudotime. It projects gene expression of specific genes ( highlight.genes
) along pseudotime as well as the expression and distribution of clonotypes (top.N.clonotypes
) along pseudotime. Additionally it plots different cell characteristics along pseudotime (ridgeline.separator
can be any meta data input such as ‘group_id’, ‘seurat_clusters’ etc.). Here we choose the default clonotypes
which displays the top.N.clonotypes = 5
along pseudotime.
<- VDJ_GEX_clonotyme(vdj.gex.matrix.output = VGM, version="v3",
clonotyme_output highlight.genes= c('CD3E','CD4',
'CD8A','CD44'),
root.nodes = c('Y_93', 'Y_157'),
colors = color_palette,
top.N.clonotypes = 5,
ridgeline.separator = 'clonotype')
2]]
clonotyme_output[[3]]
clonotyme_output[[4]] clonotyme_output[[
clonotyme_output[[2]]
shows the distribution of the clustered cell types along pseudotime. clonotyme_output[[3]]
shows the expression of the highlight.genes
along pseudotime and cells colored based on the seurat_clusters identity. clonotyme_output[[4]]
shows the expression and distribution of clonotypes for the specified genes along pseudotime.
Here the ridgeline.separator
was set to the cell types annoted by PRojecTILs.
<- VDJ_GEX_clonotyme(vdj.gex.matrix.output = VGM, version="v3",
clonotyme_output highlight.genes= c('CD3E','CD4',
'CD8A','CD44'),
root.nodes = c('Y_93', 'Y_157'),
colors = color_palette,
top.N.clonotypes = 3,
ridgeline.separator = 'ProjecTILS_assignment')
2]] clonotyme_output[[
The VGM_expanded_clones()
function can be used to infer whether a cell belongs to an expanded clonotype or not. It can be of interest to see how expanded and not-expanded cells behave along pseudotime. For that, we use ridgeline.separator = "expanded_clonotype_frequency"
.
<- VGM_expanded_clones(VGM = VGM,
VGM add.to.VDJ = TRUE,
add.to.GEX = TRUE,
expansion.threshold = 1)
<- VDJ_GEX_clonotyme(vdj.gex.matrix.output = VGM, version="v3",
clonotyme_output_expanded highlight.genes = c('CD3E', 'CD4', 'CD8A', 'CD44'),
root.nodes = c('Y_93', 'Y_157'),
top.N.clonotypes = 3,
ridgeline.separator = "expanded_clonotype_frequency")
2]] clonotyme_output_expanded[[
The VDJ_GEX_clonotyme
function can also be used in combination with module scores for specific genes. For that we create a list of vectors of interesting genes (gene.list
).
First, let’s get the module scores for each gene vector ( In our case there are two). Seurat::AddModuleScore()
adds the score for each module to the VGM. In a next steps these module scores can be visualized over pseudotime by using the parameter genes.for.module.score
.
Important: please keep the naming convention (name = paste0(names(gene.set), "_module")
) in AddModuleScore() as shown. Otherwise VDJ_GEX_clonotyme()
will not recognize the genes.for.module.score
input!
<- list(c('CD3E', 'CD4'),c('CD8A', 'CD44')) # list of vectors of genes
gene.set names(gene.set) <- c("a_genes", "b_genes") #name gene sets
2]] <- Seurat::AddModuleScore(object = VGM[[2]],features = gene.set, name = paste0(names(gene.set), "_module"))
VGM[[
<- VDJ_GEX_clonotyme(vdj.gex.matrix.output = VGM,
clonotyme_output_modules version="v3",
colors = color_palette,
root.nodes = c('Y_93', 'Y_157'),
genes.for.module.score = gene.set)
4]] clonotyme_output_modules[[
## [[1]]
##
## [[2]]
The output of VDJ_GEX_clonotyme
stays the same as above expect for Element [[4]]
which shows the module scores along pseudotime.
There are other trajectory inference methods than Monocle3 and the following function uses the slingshot
library and shows a different approach to trajectory inference. It clusters again the cells and instead of inferring pseudotime trajectories, it displays multiple lineage trajectories based on the clustered cells. The root cluster has to be defined beforehand, here we take cluster 5 since that’s where the most T_cmp cells are located as our starting root.
::DimPlot(VGM[[2]], reduction = "umap",
Seuratgroup.by = "seurat_clusters", pt.size = 0.5, label = TRUE)
<- GEX_lineage_trajectories(VGM[[2]], cluster.num = 5) lineage_trajectory_output